Annotation and analysis of disfluencies in a spontaneous speech corpus in Spanish

نویسندگان

  • Luis Javier Rodríguez-Fuentes
  • M. Inés Torres
  • Amparo Varona
چکیده

A new database consisting of 227 dialogues in Spanish was annotated with disfluencies. Then a detailed analysis of the annotations was carried out. The database had been recorded according to the well known Wizard of Oz paradigm. Seventy-five speakers were given each one three different scenarios to make queries about timetables, prices and other conditions of train travels between two spanish cities. The notion of disfluency was relaxed to include any acoustic, lexical or syntactic feature that distinguises spontaneous from read speech. A specific XML annotation scheme was developed. A simple text editor was used to insert marks, and a specific parser was implemented to find errors in annotations. The analysis of annotations revealed that disfluencies were not uniformly distributed among either user turns or speakers. Most disfluencies were grouped into certain user turns, especially the first one. On the other hand, some speakers were remarkably more prone to hesitate, repeat or correct fragments of speech than others.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic detection and annotation of disfluencies in spoken French corpora

In this paper we propose a multi-step system for the semiautomatic detection and annotation of disfluencies in spoken corpora. A set of rules, statistical models and machine learning techniques are applied to the input, which is a transcription aligned to the speech signal. The system uses the results of an automatic estimation of prosodic, part-of-speech and shallow syntactic features. We pres...

متن کامل

Annotation and analysis of overlapping speech in political interviews

Looking for a better understanding of spontaneous speech-related phenomena and to improve automatic speech recognition (ASR), we present here a study on the relationship between the occurrence of overlapping speech segments and disfluencies (filled pauses, repetitions, revisions) in political interviews. First we present our data, and our overlap annotation scheme. We detail our choice of overl...

متن کامل

A Corpus of Spontaneous Speech in Lectures: The KIT Lecture Corpus for Spoken Language Processing and Translation

With the increasing number of applications handling spontaneous speech, the needs to process spoken languages become stronger. Speech disfluency is one of the most challenging tasks to deal with in automatic speech processing. As most applications are trained with well-formed, written texts, many issues arise when processing spontaneous speech due to its distinctive characteristics. Therefore, ...

متن کامل

Cultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis

This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...

متن کامل

DisMo: A Morphosyntactic, Disfluency and Multi-Word Unit Annotator. An Evaluation on a Corpus of French Spontaneous and Read Speech

We present DisMo, a multi-level annotator for spoken language corpora that integrates part-of-speech tagging with basic disfluency detection and annotation, and multi-word unit recognition. DisMo is a hybrid system that uses a combination of lexical resources, rules, and statistical models based on Conditional Random Fields (CRF). In this paper, we present the first public version of DisMo for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001